21 research outputs found

    An optimized TOPS+ comparison method for enhanced TOPS models

    Get PDF
    This article has been made available through the Brunel Open Access Publishing Fund.Background Although methods based on highly abstract descriptions of protein structures, such as VAST and TOPS, can perform very fast protein structure comparison, the results can lack a high degree of biological significance. Previously we have discussed the basic mechanisms of our novel method for structure comparison based on our TOPS+ model (Topological descriptions of Protein Structures Enhanced with Ligand Information). In this paper we show how these results can be significantly improved using parameter optimization, and we call the resulting optimised TOPS+ method as advanced TOPS+ comparison method i.e. advTOPS+. Results We have developed a TOPS+ string model as an improvement to the TOPS [1-3] graph model by considering loops as secondary structure elements (SSEs) in addition to helices and strands, representing ligands as first class objects, and describing interactions between SSEs, and SSEs and ligands, by incoming and outgoing arcs, annotating SSEs with the interaction direction and type. Benchmarking results of an all-against-all pairwise comparison using a large dataset of 2,620 non-redundant structures from the PDB40 dataset [4] demonstrate the biological significance, in terms of SCOP classification at the superfamily level, of our TOPS+ comparison method. Conclusions Our advanced TOPS+ comparison shows better performance on the PDB40 dataset [4] compared to our basic TOPS+ method, giving 90 percent accuracy for SCOP alpha+beta; a 6 percent increase in accuracy compared to the TOPS and basic TOPS+ methods. It also outperforms the TOPS, basic TOPS+ and SSAP comparison methods on the Chew-Kedem dataset [5], achieving 98 percent accuracy. Software Availability: The TOPS+ comparison server is available at http://balabio.dcs.gla.ac.uk/mallika/WebTOPS/.This article is available through the Brunel Open Access Publishing Fun

    Coverage of whole proteome by structural genomics observed through protein homology modeling database

    Get PDF
    We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE (http://daisy.nagahama-i-bio.ac.jp/Famsbase/), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics

    Cross-Over between Discrete and Continuous Protein Structure Space: Insights into Automatic Classification and Networks of Protein Structures

    Get PDF
    Structural classifications of proteins assume the existence of the fold, which is an intrinsic equivalence class of protein domains. Here, we test in which conditions such an equivalence class is compatible with objective similarity measures. We base our analysis on the transitive property of the equivalence relationship, requiring that similarity of A with B and B with C implies that A and C are also similar. Divergent gene evolution leads us to expect that the transitive property should approximately hold. However, if protein domains are a combination of recurrent short polypeptide fragments, as proposed by several authors, then similarity of partial fragments may violate the transitive property, favouring the continuous view of the protein structure space. We propose a measure to quantify the violations of the transitive property when a clustering algorithm joins elements into clusters, and we find out that such violations present a well defined and detectable cross-over point, from an approximately transitive regime at high structure similarity to a regime with large transitivity violations and large differences in length at low similarity. We argue that protein structure space is discrete and hierarchic classification is justified up to this cross-over point, whereas at lower similarities the structure space is continuous and it should be represented as a network. We have tested the qualitative behaviour of this measure, varying all the choices involved in the automatic classification procedure, i.e., domain decomposition, alignment algorithm, similarity score, and clustering algorithm, and we have found out that this behaviour is quite robust. The final classification depends on the chosen algorithms. We used the values of the clustering coefficient and the transitivity violations to select the optimal choices among those that we tested. Interestingly, this criterion also favours the agreement between automatic and expert classifications. As a domain set, we have selected a consensus set of 2,890 domains decomposed very similarly in SCOP and CATH. As an alignment algorithm, we used a global version of MAMMOTH developed in our group, which is both rapid and accurate. As a similarity measure, we used the size-normalized contact overlap, and as a clustering algorithm, we used average linkage. The resulting automatic classification at the cross-over point was more consistent than expert ones with respect to the structure similarity measure, with 86% of the clusters corresponding to subsets of either SCOP or CATH superfamilies and fewer than 5% containing domains in distinct folds according to both SCOP and CATH. Almost 15% of SCOP superfamilies and 10% of CATH superfamilies were split, consistent with the notion of fold change in protein evolution. These results were qualitatively robust for all choices that we tested, although we did not try to use alignment algorithms developed by other groups. Folds defined in SCOP and CATH would be completely joined in the regime of large transitivity violations where clustering is more arbitrary. Consistently, the agreement between SCOP and CATH at fold level was lower than their agreement with the automatic classification obtained using as a clustering algorithm, respectively, average linkage (for SCOP) or single linkage (for CATH). The networks representing significant evolutionary and structural relationships between clusters beyond the cross-over point may allow us to perform evolutionary, structural, or functional analyses beyond the limits of classification schemes. These networks and the underlying clusters are available at http://ub.cbm.uam.es/research/ProtNet.ph

    Modulation of renal-specific oxidoreductase/myo-inositol oxygenase by high-glucose ambience

    No full text
    Biological properties of renal-specific oxidoreductase (RSOR), characteristics of its promoter, and underlying mechanisms regulating its expression in diabetes were analyzed. RSOR expression, normally confined to the renal cortex, was markedly increased and extended into the outer medullary tubules in db/db mice, a model of type 2 diabetes. Exposure of LLCPK cells to d-glucose resulted in a dose-dependent increase in RSOR expression and its enzymatic activity. The latter was related to one of the glycolytic enzymes, myo-inositol oxygenase. The increase in activity was in proportion to serum glucose concentration. The RSOR expression also increased in cells treated with various organic osmolytes, e.g., sorbitol, myoinositol, and glycerolphosphoryl-choline and H(2)O(2). Basal promoter activity was confined to –1,252 bp upstream of ATG, and it increased with the treatment of high glucose and osmolytes. EMSAs indicated an increased binding activity with osmotic-, carbohydrate-, and oxidant-response elements in cells treated with high glucose and was abolished by competitors. Supershifts, detected by anti-nuclear factor of activated T cells, and carbohydrate-response-element-binding protein established the binding specificity. Nuclear factor of activated T cells tonicity-enhancer-binding protein and carbohydrate-response-element-binding protein had increased nuclear expression in cells treated with high glucose. The activity of osmotic-response element exhibited a unique alternate binding pattern, as yet unreported in osmoregulatory genes. Data indicate that RSOR activity is modulated by diverse mechanisms, and it is endowed with dual properties to channel glucose intermediaries, characteristic of hepatic aldehyde reductases, and to maintain osmoregulation, a function of renal medullary genes, e.g., aldose reductase, in diabetes

    The NMR solution structure of the 30S ribosomal protein S27e encoded in gene RS27_ARCFU of Archaeoglobus fulgidis reveals a novel protein fold

    No full text
    The Archaeoglobus fulgidis gene RS27_ARCFU encodes the 30S ribosomal protein S27e. Here, we present the high-quality NMR solution structure of this archaeal protein, which comprises a C4 zinc finger motif of the CX2CX14-16CX2C class. S27e was selected as a target of the Northeast Structural Genomics Consortium (target ID: GR2), and its three-dimensional structure is the first representative of a family of more than 116 homologous proteins occurring in eukaryotic and archaeal cells. As a salient feature of its molecular architecture, S27e exhibits a β-sandwich consisting of two three-stranded sheets with topology B(↓), A(↑), F(↓), and C(↑), D(↓), E(↑). Due to the uniqueness of the arrangement of the strands, the resulting fold was found to be novel. Residues that are highly conserved among the S27 proteins allowed identification of a structural motif of putative functional importance; a conserved hydrophobic patch may well play a pivotal role for functioning of S27 proteins, be it in archaeal or eukaryotic cells. The structure of human S27, which possesses a 26-residue amino-terminal extension when compared with the archaeal S27e, was modeled on the basis of two structural templates, S27e for the carboxy-terminal core and the amino-terminal segment of the archaeal ribosomal protein L37Ae for the extension. Remarkably, the electrostatic surface properties of archaeal and human proteins are predicted to be entirely different, pointing at either functional variations among archaeal and eukaryotic S27 proteins, or, assuming that the function remained invariant, to a concerted evolutionary change of the surface potential of proteins interacting with S27
    corecore